Vest Development Region
Overview of the 22nd International Conference on Informatics in Control, Automation and Robotics
ICINCO 2025 (22nd International Conference on Informatics in Control, Automation and Robotics) received 158 paper submissions from 42 countries. To evaluate each submission, a double-blind paper review was performed by the Program Committee. After a stringent selection process, 43 papers were published and presented as full papers, i.e. completed work (12 pages/25' oral presentation), 86 papers were accepted as short papers (51 as oral presentation). The organizing committee included the ICINCO Conference Chair: Dimitar Filev, Ford Research, United States, and the ICINCO 2025 Program Chairs: Giuseppina Carla Gini, Politecnico di Milano, Italy, and Radu-Emil Precup, Politehnica University of Timisoara, Romania. At the closing session, the conference acknowledged a few papers that were considered excellent in their class, presenting a "Best Paper Award", "Best Student Paper Award", "Best Poster Award", and "Best Industrial Paper Award" for the conference.
- North America > United States (0.26)
- Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.26)
- Europe > Italy > Lombardy > Milan (0.26)
- (4 more...)
Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas
Sovrano, Francesco, Dominici, Gabriele, Sevastjanova, Rita, Stramiglio, Alessandra, Bacchelli, Alberto
Human cognitive biases in software engineering can lead to costly errors. While general-purpose AI (GPAI) systems may help mitigate these biases due to their non-human nature, their training on human-generated data raises a critical question: Do GPAI systems themselves exhibit cognitive biases? To investigate this, we present the first dynamic benchmarking framework to evaluate data-induced cognitive biases in GPAI within software engineering workflows. Starting with a seed set of 16 hand-crafted realistic tasks, each featuring one of 8 cognitive biases (e.g., anchoring, framing) and corresponding unbiased variants, we test whether bias-inducing linguistic cues unrelated to task logic can lead GPAI systems from correct to incorrect conclusions. To scale the benchmark and ensure realism, we develop an on-demand augmentation pipeline relying on GPAI systems to generate task variants that preserve bias-inducing cues while varying surface details. This pipeline ensures correctness (88-99% on average, according to human evaluation), promotes diversity, and controls reasoning complexity by leveraging Prolog-based reasoning. We evaluate leading GPAI systems (GPT, LLaMA, DeepSeek) and find a consistent tendency to rely on shallow linguistic heuristics over more complex reasoning. All systems exhibit bias sensitivity (6-35%), which increases with task complexity (up to 49%) and highlights risks in AI-driven software engineering.
- Europe > Switzerland > Zürich > Zürich (1.00)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.98)
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Romania > Vest Development Region > Timiș County > Timișoara (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective
Huang, Zhenya, Liu, Jiayu, Lin, Xin, Ma, Zhiyuan, Xue, Shangzi, Xiao, Tong, Liu, Qi, Teh, Yee Whye, Chen, Enhong
Math word problem (MWP) serves as a fundamental research topic in artificial intelligence (AI) dating back to 1960s. This research aims to advance the reasoning abilities of AI by mirroring the human-like cognitive intelligence. The mainstream technological paradigm has evolved from the early rule-based methods, to deep learning models, and is rapidly advancing towards large language models. However, the field still lacks a systematic taxonomy for the MWP survey along with a discussion of current development trends. Therefore, in this paper, we aim to comprehensively review related research in MWP solving through the lens of human cognition, to demonstrate how recent AI models are advancing in simulating human cognitive abilities. Specifically, we summarize 5 crucial cognitive abilities for MWP solving, including Problem Understanding, Logical Organization, Associative Memory, Critical Thinking, and Knowledge Learning. Focused on these abilities, we review two mainstream MWP models in recent 10 years: neural network solvers, and LLM based solvers, and discuss the core human-like abilities they demonstrated in their intricate problem-solving process. Moreover, we rerun all the representative MWP solvers and supplement their performance on 5 mainstream benchmarks for a unified comparison. To the best of our knowledge, this survey first comprehensively analyzes the influential MWP research of the past decade from the perspective of human reasoning cognition and provides an integrative overall comparison across existing approaches. We hope it can inspire further research in AI reasoning. Our repository is released on https://github.com/Ljyustc/FoI-MWP.
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- (5 more...)
- Workflow (1.00)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (3 more...)
Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features
Sung, Hakyung, Csuros, Karla, Sung, Min-Chang
This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.
- North America > United States > Oregon (0.40)
- Asia > Taiwan (0.04)
- Asia > Japan (0.04)
- (13 more...)
Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis
Huang, Zhechong, Zhang, Zhao, Ji, Ruyi, Xia, Tingxuan, Zhu, Qihao, Cao, Qinxiang, Sun, Zeyu, Xiong, Yingfei
Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (16 more...)
- Workflow (0.94)
- Research Report (0.82)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.83)
- (2 more...)
- North America > United States > New York (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models
Ghinea, Dragos-Dumitru, Corbeanu, Adela-Nicoleta, Dumitran, Adrian-Marius
In recent years, large language models (LLMs) have demonstrated significant potential across various natural language processing (NLP) tasks. However, their performance in domain-specific applications and non-English languages remains less explored. This study introduces a novel Romanian-language dataset for multiple-choice biology questions, carefully curated to assess LLM comprehension and reasoning capabilities in scientific contexts. Containing approximately 14,000 questions, the dataset provides a comprehensive resource for evaluating and improving LLM performance in biology. We benchmark several popular LLMs, analyzing their accuracy, reasoning patterns, and ability to understand domain-specific terminology and linguistic nuances. Additionally, we perform comprehensive experiments to evaluate the impact of prompt engineering, fine-tuning, and other optimization techniques on model performance. Our findings highlight both the strengths and limitations of current LLMs in handling specialized knowledge tasks in low-resource languages, offering valuable insights for future research and development.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
HISPASpoof: A New Dataset For Spanish Speech Forensics
Risques, Maria, Bhagtani, Kratika, Yadav, Amit Kumar Singh, Delp, Edward J.
West Lafayette, Indiana, USA Abstract--Zero-shot V oice Cloning (VC) and T ext-to-Speech (TTS) methods have advanced rapidly, enabling the generation of highly realistic synthetic speech and raising serious concerns about their misuse. While numerous detectors have been developed for English and Chinese, Spanish--spoken by over 600 million people worldwide--remains underrepresented in speech forensics. T o address this gap, we introduce HISPASpoof, the first large-scale Spanish dataset designed for synthetic speech detection and attribution. It includes real speech from public corpora across six accents and synthetic speech generated with six zero-shot TTS systems. We evaluate five representative methods, showing that detectors trained on English fail to generalize to Spanish, while training on HISPASpoof substantially improves detection. We also evaluate synthetic speech attribution performance on HISPASpoof, i.e., identifying the generation method of synthetic speech. HISPASpoof thus provides a critical benchmark for advancing reliable and inclusive speech forensics in Spanish. The rapid advancement of speech synthesis techniques has significantly transformed the area of audio generation and speech forensics. Recent Text-to-Speech (TTS) and V oice Cloning (VC) methods [1], [2], [3], [4], [5], [6] are now capable of producing highly realistic synthetic voices that closely mimic the spectral, prosodic, and linguistic traits of real human speech [7], [8], [9], [10].
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.24)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.24)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (21 more...)
- Media (0.68)
- Information Technology > Security & Privacy (0.47)
- Government (0.46)
Quality control in sublinear time: a case study via random graphs
Marcussen, Cassandra, Rubinfeld, Ronitt, Sudan, Madhu
Many algorithms are designed to work well on average over inputs. When running such an algorithm on an arbitrary input, we must ask: Can we trust the algorithm on this input? We identify a new class of algorithmic problems addressing this, which we call "Quality Control Problems." These problems are specified by a (positive, real-valued) "quality function" $ρ$ and a distribution $D$ such that, with high probability, a sample drawn from $D$ is "high quality," meaning its $ρ$-value is near $1$. The goal is to accept inputs $x \sim D$ and reject potentially adversarially generated inputs $x$ with $ρ(x)$ far from $1$. The objective of quality control is thus weaker than either component problem: testing for "$ρ(x) \approx 1$" or testing if $x \sim D$, and offers the possibility of more efficient algorithms. In this work, we consider the sublinear version of the quality control problem, where $D \in Δ(\{0,1\}^N)$ and the goal is to solve the $(D ,ρ)$-quality problem with $o(N)$ queries and time. As a case study, we consider random graphs, i.e., $D = G_{n,p}$ (and $N = \binom{n}2$), and the $k$-clique count function $ρ_k := C_k(G)/\mathbb{E}_{G' \sim G_{n,p}}[C_k(G')]$, where $C_k(G)$ is the number of $k$-cliques in $G$. Testing if $G \sim G_{n,p}$ with one sample, let alone with sublinear query access to the sample, is of course impossible. Testing if $ρ_k(G)\approx 1$ requires $p^{-Ω(k^2)}$ samples. In contrast, we show that the quality control problem for $G_{n,p}$ (with $n \geq p^{-ck}$ for some constant $c$) with respect to $ρ_k$ can be tested with $p^{-O(k)}$ queries and time, showing quality control is provably superpolynomially more efficient in this setting. More generally, for a motif $H$ of maximum degree $Δ(H)$, the respective quality control problem can be solved with $p^{-O(Δ(H))}$ queries and running time.
- Africa > Sudan (0.40)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Washington > King County > Seattle (0.13)
- (17 more...)